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1 Introduction to intractable likelihood models for high frequency 
hnancial market dynamics 

In this chapter we consider classes of models that have been recently developed for quantitative 
finance that involve modelling a highly complex multivariate, multi-attribute stochastic process 
known as the Limit Order Book (LOB). The LOB is the primary data structure recorded each day 
intra-daily for all assets on every electronic exchange in the world in which trading takes place. As 
such, it represents one of the most important fundamental structures to study from a stochastic 
process perspective if one wishes to characterize features of stochastic dynamics for price, volume, 
liquidity and other important attributes for a traded asset. In this paper we aim to adopt the 


model structure recently proposed by Panayi and Peters 2015 , which develops a stochastic model 


framework for the LOB of a given asset and to explain how to perform calibration of this stochastic 
model to real observed LOB data for a range of different assets. One can consider this class of 
problems as truly a setting in which both the likelihood is intractable to evaluate pointwise, but 
trivial to simulate, and in addition the amount of data is massive. This is a true example of big-data 
application as for each day and for each asset one can have anywhere between 100,000-500,000 data 
vectors for the calibration of the models. 

The class of calibration techniques we will consider here involves a Bayesian ABC reformulation 
of the indirect inference framework developed under the multi-objective optimization formulation 
proposed recently by Panayi and Peters [2015 . To facilitate an equivalent comparison for the 
two frameworks, we also adopt a reformulation of the class of genetic stochastic search algorithms 

2015| , known as NGSA-II Deb et ah, 2002 . We adapt this widely 


utilised by Panayi and Peters 


utilised stochastic genetic search algorithm from the multi-objective optimization algorithm litera¬ 
ture to allow it to be utilised as a mutation kernal in a class of Sequential Monte Carlo Samplers 
(SMC Sampler) algorithms in the ABC context. We begin with the problem and model formula- 


1 

















tion, then we discuss the estimation frameworks and finish with some real data simulation results 
for equity data from a highly utilised pan-European secondary exchange formerly known as Chi-X, 
before it was recently aquired by BATS. 


1.1 Introduction to the LOB and related multi-queue simulation models 


The structure of a financial market dictates the form of interaction between buyers and sellers. 
Markets for financial securities generally operate either as quote driven markets, in which specialists 
(dealers) provide 2-way prices, or order driven markets, in which participants can trade directly 
with each other by expressing their trading interest through a central matching mechanism. The 
LOB is an example of the latter, and indicatively, the Helsinki, Hong Kong, Shenzhen, Swiss, Tokyo, 
Toronto, and Vancouver Stock Exchanges, together with Euronext and the Australian Securities 
Exchange operate as pure LOBs, while the New York Stock Exchange, NASDAQ, and the London 
Stock Exchange also operate a hybrid LOB system Gould et al., 2013 , with specialists operating 
for the less liquid securities. 

We will consider trading activity in the context of the LOB in this chapter. Market participants 
are typically allowed to place two types of orders on the venue: Limit orders, where they specify 
a price over which they are unwilling to buy (or a price under which they are unwilling to sell), 
and market orders, which are executed at the best available price. Market orders are executed 
immediately, provided there are orders of the same size on the opposite side of the book. Limit 
orders to buy (sell) are only executed if there is opposite trading interest in the order book at, or 
below (above), the specified limit price. If there is no such interest, the order is entered into the 
limit order book, where orders are displayed by price, then time priority. 
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Eigure 1: An example of the state of the Chi-X order book. The left hand side of the book is the 
buying interest, while the right hand side is the selling interest. The highest bid (order to buy) is 
for 100 shares at 2700 cents, while there are two lowest offers (orders to sell) for 70 and 100 shares 
at 2702. Orders are prioritised by price, then time. 


Figure shows an example snapshot of the order book for a particular stock, as traded on the 
Chi-X exchange, at a particular instance of time. A market order to buy 200 shares would result in 
3 trades: 70 shares at 2702, another 100 shares at 2702 and the remaining 30 at 2704. A limit order 
to sell 300 shares at 2705, on the other hand, would not be executed immediately, as the highest 
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order to buy is only at 2700 cents. It would instead enter the limit order book on the right hand 
side, second in priority at 2705 cents after the order for 120 shares which is already in the book. 

LOB simulation models aim to generate the trading interactions observed in such a LOB, and 
allow for a realistic description of the intra-day trading process. In particular, the models simulate 
the behaviour of individual market participants, usually based on the behaviour of various classes 
of real traders. The price of the traded hnancial asset is then determined from the limit and 
market orders submitted by these traders. Depending on the model, the instantaneous price is 
either considered to be the mid-point between the highest bid price and lowest ask price, or the 
last traded price. 

Because of the practical interest of modelling the intra-day dynamics of both stock prices and 
available volumes in the LOB, as well as the difficulty of traditional economic models based on 
rationality to reproduce these dynamics, there have been a multitude of research approaches at¬ 
tempting to address this gap. One one hand, there have been the zero-intelligence approaches (e.g. 
Maslov [2000]), which generally consist of a single, unsophisticated type of trading agent who sub¬ 


mits orders randomly, possibly subject to (very few) constraints, like budget considerations. This 
minimum amount of discipline imposed by their actions is therefore often sufficient to reproduce 
some commonly observed features of financial markets, such as fat tails in the distribution of returns 
[Maslov 2000 . Later models [LiCalzi and Pellizzari 2003, Fricke and Lux 2013 also considered 


more realistic trading behaviour, where agents act based on their perceived value of the financial 
asset. However, LiCalzi and Pellizzari [2003 noted that these additional considerations regarding 
agent behaviour did not necessarily lead to a more realistic stock price output, and that it was 
likely the imposed market structure that had the largest effect on reproducing the price features 
observed. 

One other prominent strand of research pertains to the modelling of the LOB as a set of queues 
at each price and on both the buy (bid) and sell (ask) side. In the models of Cont et al. 2010 and 
Cont and De Larrard 2013 , a power law characterising the intensities of the limit order processes 


is found to be a good emprical fit. However, their assumption of independence between the activity 
at each level and for each event type is unlikely to hold in modern LOBs, due to the presence of 
algorithmic trading strategies, which induce different types of dependence structures. It is clear 
from observed empirical LOB data that non-trivial dependence structures are present, and as such, 
ignoring these features in the model formulation will result in inadequate representations of the 
stochastic LOB process being modelled. 

In the LOB simulation model introduced in Panayi and Peters 2015 , they attempted to provide 
both a richer description of the LOB market structure and its constituent agents, but also consider 
the dependence between the trading activity at different levels. The main components of the 
proposed model are detailed in the following sections, before it is extended into an ABC posterior 
formulation for estimation and inference purposes. 


2 Bayesian models with intractable likelihoods for high frequency 
financial market dynamics 

In this section we develop a new class of Bayesian models that can be utilised to study the dynamics 
of Limit Order Books (LOB) intra-daily. We start by presenting the stochastic multivariate order 
flow model, which we develop as a “stochastic representative agent” model framework. This is then 
reformulated as an intractable likelihood stochastic model and developed into an Approximate 
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Bayesian Computational (posterior model). 

2.1 Limit Order Book agent-based model 

We consider the intra-day LOB activity in fixed intervals of time ... ,[t — l,t), [t, t -|- 1),.... For 
every interval [t,t -|- 1), we allow the total number of levels on the bid or ask sides of the LOB to 
be dynamically adjusted as the simulation evolves. These LOB levels are dehned with respect to 
two reference prices, equal to p^’\ and i.e. the price of the highest bid and lowest ask price at 
the start of the interval. We consider these reference prices to be constant throughout the interval 
— and thus, the levels on the bid side of the book are defined at integer number of ticks away 
from , while the levels on the ask side of the book are defined at integer number of ticks away 

r &,1 

irom Pf_i- 

This does not mean that we expect the best bid and ask prices to remain constant, just that we 
model the activity (i.e. limit order arrivals, cancellations and executions) according to the distance 
in ticks from these reference prices during this period. We note that it is of course possible that 
the volume at the best bid price is consumed during the interval, and that limit orders to sell 
are posted at this price, which would be considered at 0 ticks away from the reference price. To 
allow for this possibility, we actively model the activity at —ld + 1, ■ ■ ■ ,0,... ,lp ticks away from each 
reference price. Here, the p subscript will refer to passive orders, i.e. orders which would not lead to 
immediate execution, if the reference prices remained constant and d refers to direct, or aggressive 
orders, where it is again understood that they are aggressive with respect to the reference prices 
at the start of the period. Therefore, we actively model the activity at a total k = lp + Id levels on 
the bid and ask. 

We assume that activity that occurs further away is uncorrelated with the activity close to 
the top of the book, and therefore unlikely to have much of an impact on price evolution and the 
properties of the volume process. Therefore, the volume resting outside the actively modelled LOB 
levels {—Id + 1,..., 0,..., /p) on the bid and ask is assumed to remain unchanged until the agent 
interactions brings those levels inside the band of actively modelled levels, at which time they will 
again dynamically evolve. Such a set of assumptions is consistent with observed stylized features 
of all LOB for all modern electronic exchanges. 

To present the details of the simulation framework, including the stochastic model components 
for each agent, i.e. liquidity providers and liquidity demanders, we first define the following notation: 

1. ^ - the random vector for the number of orders resting at each level 

on the ask side at time t at the actively modelled levels of the LOB at time t; 

2. ,..., - the random vector for the number of limit orders 

entering the limit order book on the ask side at each level in the interval [t — 1, t); 

3. ..., - the random vector for the number of limit orders cancelled on 

the ask side in the interval [t — 1, t); 

4. - the random variable for the number of market orders submitted by liquidity deman¬ 
ders in the interval [t — 1, t). 

We consider the processes for limit orders and market orders, as well as cancellations to be 
linked to the behaviour of real market participants in the LOB. In the following, we model the 
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aggregation of the activity of 2 classes of liquidity motivated agents, namely liquidity providers and 
liquidity demanders. As we model LOB activity in discrete time intervals, we process the aggregate 
activity at the end of each time interval in the following order: 

1. Limit order arrivals - passive - by the liquidity provider agent; 

2. Limit order arrivals - aggressive or direct - by the liquidity provider agent; 

3. Cancellations by the liquidity provider agent; 

4. Market orders by the liquidity demander agent. 

The rationale for this ordering is that the vast majority of limit order submissions and cancella¬ 
tions are typically accounted for by the activity of high-frequency traders, and many resting orders 
are cancelled before slower traders can execute against them. In addition, such an ordering allows 
us to condition on the state of the LOB, so that we do not have more cancellations at a particular 
level than the orders resting at that level. This is generally appropriate, as the time interval we 
consider can be made as small as desired for a given simulation. 


2.1.1 Stochastic agent representation: liquidity providers and demanders 


We assume liquidity providers are responsible for all market-making behaviour (i.e. limit order 
submissions and cancellations on both the bid and ask side of the LOB). After liquidity is posted 
to the LOB, liquidity seeking market participants, such as mutual funds using some execution 
algorithm, can take advantage of the resting volume with market orders. For market makers, 
achieving a balance between volume executed on the bid and the ask side can be prohtable. However, 
there is also the risk of adverse selection, i.e. trading against a trader with superior information. 
This may lead to losses if, e.g. a trader posts multiple market orders that consume the volume on 
several levels of the LOB. The risk of adverse selection as a result of asymmetric information is 
one of the basic tenets of market microstructure theory [O’hara 1995 . To reduce this risk, market 


makers cancel and resubmit orders at different prices and/or different sizes. 


Definition 1 (Limit order submission process for the liquidity provider agent) Consider 
the limit order submission proeess of the liquidity provider agent to include both passive and ag¬ 
gressive limit orders on the bid and ask sides of the book, which are assumed to have the following 
stochastic model structure: 

1. Let the multivariate path-space random matrix G be constructed from random 


vectors for the numbers of limit order placements = \^N 

Furthermore, assume these random vectors for the number of orders at each level at time 
t are each conditionally dependent on a latent stochastic process for the intensity at which 
the limit orders arrive, given by the random matrix G and on the path-space 


jLO^k j^LO 




LO,k 


by = A 


,A/ 


In the following, k G {a, b} indicates the respective 


process on the ask and bid side. 


2. Assume the conditional independence property for the random vectors given by, 


NfO,k\^LO,k 


X 


N, 


LO,k I j^LO^k 


, Vs / L s,t G {1,2,... ,T}. 


( 1 ) 
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3. For each time interval [t — l,t) from the start of trading on the day, let the random vector for 
the number of new limit orders placed in each actively modelled level of the limit order book, 
i.e. the price points corresponding to ticks {—Id + 1,..., 0,1,..., Ip), be denoted by = 

(jY^LO,A:,-Zd+i^ ^ ^ ^ ^ and assume that these random vectors satisfy the conditional 

independence property 


j^LO,k,s ^j^LO,k,s 


X 


j^LO,k,q^j^LO,k,q 


Vs ^ -5, ^ G { 1, . . . , 0, 1, . . . , Zp} . 


( 2 ) 


4- Assume the random vector G is distributed according to a multivariate generalized 

Cox process with conditional distribution QCV given by 


Pr 


( ]\TkO,k, — ld-\-l _ T\jkO,k^lp _ 

I iV^ — fil, . . . , iv^ — 


nit 


LO,k _ X LO,fc\ _ T-rl 


= At 


=n:’L 






\ LO.k.s 

exp 


(3) 


5. Assume the independence property for random vectors of latent intensities unconditionally 
according to 

Vs/t, s,te{l,2,...,T]. (4) 

6. Assume that the intensity random vector G IR^J. is obtained through an element-wise 

transformation of the random vector G where for each element we have the mapping 


L LO^k^s 

h 


= ho 


LO,k,s 


Fir 


^LO^k,s 

t 


(5) 


where we have s G {—+ 1,...,/p}, baseline intensity parameters G IR+ and a 

strictly monotonic mapping F : IR i—)> [0,1]. 

7. Assume the random vector G IR is distributed according to a multivariate skew-t dis¬ 
tribution ^ MSt{m^, (3^, ,Tf) with location parameter vector G skewness 

parameter vector (3^ G degrees of freedom parameter G IKI+ andkxlt covariance matrix 

Yf. Hence, density function 


j exp( 7 t-m'=)^[s'=] ^f 3 ’<= 

f^Lo,k ^- 


^(i/'=+Q(')'t,m'=))[/3'=]^[s'=] ^/3'= 


_Ci+h 


Xit 


1 I ^ 

' , .A: 


where Ky{z) is a modified Bessel function of the second kind given by 


1 

Kv{z) = - / y^-^e-^2(y+y-Ady, 

Z Jo 


( 6 ) 

(7) 


and c is a normalisation constant. We also define the function Q{-, •) as follows: 


k\T 




1 -1 


( 7 t - vnfi). 


( 8 ) 
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This model admits skew-t marginals and a skew-t copula, see Smith et al. 20121 for details. 
Importantly, this stochastic model admits the following scale mixture representation. 


rio,fc 4 ^ ^ Viyz, (9) 

( k k \ 

and independent Gaussian random 

vector Z ~ (O, . 

8. Assume that for every element of order counts from the random vector , there 

is a corresponding random vector G of order sizes. We assume that the 

element 0^f’^’^,i G |l,..., is distributed as ~ H{-). Furthermore, we 

assume that order sizes are unconditionally independent X for i ^ i', s s' 

and t ^ t'. 


Remark 2.1 Under this proposed model for market maker liquidity activity, the number of limit 
orders placed by the liquidity providers in the market has an appropriate dynamic intensity structure 
that can evolve intra-daily to reflect the changing nature of liquidity provided by market makers 
throughout the trading day. In addition, the number of limit orders placed at each level of the bid 
and ask also allow for the model to capture the observed dependence structures in order placements 
in each level of the bid and ask regularly seen in empirical analysis of high-frequency LOB data. 
The dependence structure utilised is based on a skew-t copula which allows non-exchangeability of 
the stochastic intensity on the bid and ask at each level of the book, as well as asymmetry in the tail 
dependence features. This means that when large movements by market makers to replenish liquidity 
after a liquidity drought occurs intra-daily, such as after a large market order execution, the model 
can produce such replenishment on just the bid or just the ask depending on the situation. It does 
not automatically replenish both sides of the book equally likely, as would occur under a standard 
t-copula structure as opposed to a skew-t copula structure used in this model. 


We now define the second component of the liquidity provider agents, namely the cancellation 
process. The cancellation process has the same stochastic process model specification as the limit 
order submission process above, including a skew-t dependence structure between the stochastic 
intensities at each LOB level on the bid and ask. We therefore only specify the differences unique to 
the cancellation process relative to the order placement model definition in the below specification, 
to avoid repetition. 

Definition 2 (Limit order cancellation process for liquidity provider agent) Consider the 
limit order cancellation process of the liquidity provider agent to have an identically specified stochas¬ 
tic model structure as the limit order submissions. The exception to this pertains to the assumption 
that the number of cancelled orders in each interval at each level is right-truncated at the total 
number of orders at that level. 

1. As for submissions, we assume for cancellations a multivariate path-space random matrix 
C k I 'x.T 

G lNl_f. constructed from random vectors for the number of cancelled orders given 
by ..., ATy. Furthermore, assume for these random vectors for 

the number of cancelled orders at each of the It levels, the latent stochastic process for the 
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intensity is given by the random matrix G and given on the path-spaee by = 

A C,k A C,k A C,k 

■''■1 ’ ■''■2 ) • • • ) -''■T 


hxT 


C,k 


2. Assume that for the random vector for the volume resting in the LOB after the place- 


LO,k 


and that the random vector n9'^ G IKI^ 


ment of limit orders we have = Vf_i + N, 
is distributed according to a truncated multivariate generalized Cox process with conditional 


distribution Nf’ \Vf = ^ ~ QCV (A^’ j < v) (with v = , vi^^)) given by 




Ar = A^^F/=^ = 


n 

=-id+i i^j=. 




i=o 




r- 


( 10 ) 


3. Assume that for the cancellation count , the orders with highest priority are cancelled 

from level s (which are also the oldest orders in their respective queue). Assume also that 
cancellations always remove an order in full, i.e. there are no partial cancellations. 


Remark 2.2 Cancellations are a critical part of a market makers ability to modulate and adjust 
their liquidity activity to avoid large losses in trades that would otherwise he executed under an 
adverse selection setting. Under this proposed model for market maker liquidity removal activity 
(cancellations), the number of limit orders cancelled by the liquidity providers in the market has an 
appropriate dynamic intensity structure that can evolve intra-daily to reflect the changing nature of 
liquidity demand throughout the trading day. In addition, the number of limit orders cancelled at 
each level of the bid and ask also allow for the model to capture the observed dependence structures 
in order cancellations at each level of the bid and ask. The dependence structure utilised is based on 
a skew-t copula which allows non-exchangeability of the stochastic intensity on the bid and ask at 
each level of the book, as well as asymmetry in the tail dependence features. This means that when 
large price movements occur in the LOB, market makers need to adjust their LOB volumes and 
profile by cancelling existing resting orders and creating new orders. This typically occurs many 
times throughout the trading day, and the ability to do this with an appropriate dependence structure 
is critical. In addition, the number of cancelled orders needs to preserve the principle of volume 
preservation, that is the upper bound on the total number of Limit Orders that may be cancelled at 
any given time is based on the instantaneous resting volume in the book at the given time instant. 


We complete the specification of the representative agents by considering the specification of 
the liquidity demander agent. In addition to market makers who are incentivized to place orders 
intra-daily in the limit order book, by exchanges in which they operate, there are also other market 
participants who trade for other reasons. These other market participants include hedge funds, 
pension funds and other types of large investors, typically we refer to such groups of traders as 
liquidity demanders. They absorb liquidity throughout the day by purchasing resting orders in 
the limit order book. These purchases are most often made through market orders or aggressive 
limit orders. In this chapter we assume that all such activities can be adequately modelled by a 
stochastic liquidity demander agent making dynamically evolving decisions to place market orders, 
as specified below. 



Definition 3 (Market order submission process for liquidity demander agent) Consider 
a representative agent for the liquidity providers to he composed of a market order component, 
which has the following stochastic structure: 

1. Assume a path-space random vector G for the number of market orders con¬ 
structed from the random variables for the number of market orders in each interval = 

,..., . Furthermore, assume that for these random variables the la¬ 
tent stochastic process for the intensity is given by random variable G and given 


on the path-space by A^^'^ = A 


MO,k ^MO,k 
2 > 


AMO,k 

, I\rp 


2. Assume the conditional independence property for the random variables 


N. 




X 


N, 


MO,k\ ,MO,k 


|A 


Vs/t, s,t G {1,2,... ,r} . 


( 11 ) 


3. Assume that for the random variable for the volume resting on the opposite side of the LOB 
after the placement of limit orders and cancellations we have 
where k' = a if k = b, and vice-versa, and that the random variable G IKI+ is distributed 


-rrk^^S jyjC fk^S 

^t-At ~ 


according to a truncated generalized Cox process with conditional distribution 


r ~ gCV ( Af 


^ given by 


Pr N. 


j-MO,k 


= n 


^MO,k _ yMO,k 5A: 


.R^t=r] = 




( \ M0,k\ A 

[Xt P 

Xj=o 




( 12 ) 


4- Assume the independence property for random vectors of latent intensities unconditionally 
according to 

Af^^xAf^’^, Vs/t, s,t G {1,2,...,T}. (13) 


5. Assume that for each intensity random variable g |R_i_ there is a corresponding trans¬ 
formed intensity variable g |p the relationship for each element is given by the 

mapping 

(14) 


,MO,k MO,kj:, -^MO,k 

Rt = Fo ’ F{r 


for some baseline intensity parameter £ IR+ strictly monotonic mapping F : P i—)■ 

[ 0 , 1 ]. 

6. Assume that the random variables characterizing the intensity before transfor¬ 

mation of the Ceneralized Cox-Process, are distributed in interval [t — l,t) according to a 
univariate skew-t distribution ^ ^ ^ j^MO,k^ _ 

7. Assume that for every element of market order counts, there is a corresponding ran¬ 
dom vector ^ sizes. We assume that the element OfP’^,i G 

|l,..., ig distributed according to ~ H{-). Assume also that market order 

sizes are unconditionally independent X for i i' or t f^tf. 
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We denote the LOB state for the real dataset at time t on a given day by the random vector 
Lt, and this corresponds to the prices and volumes at each level of the bid and ask. Utilising the 
stochastic agent-based model specification described above, and given a parameter vector 6, which 
will generically represent all parameters of the liquidity providing and liquidity demanding agent 
types, one can then also generate simulations of intra-day LOB activity and arrive at the synthetic 
state (0). The state of the simulated LOB at time t is obtained from the state at time t — 1 
and a set of stochastic components, denoted generically by Xt, which are obtained from a single 
stochastic realisation of the following components of the agent-based models: 


Limit order submission intensities order numbers and order 

O LO.d.s /^LO.h^s 1 _ 7 I 1 7 ' 1 .d^s • -t jLjLO^h^s 

j^ j , where s = —0 + 1... Lp,i = 1... j = 1... ; 




Limit order cancellation intensities A^’^, A^’“ and numbers of cancellations N^’^, 


C,d 


Market order intensities A^^’°‘^ numbers of market orders ]y^o,d^Y^MO,d 

and market order sizes (J-^ ^(J-^ ’ , z = 1... , j = 1... 




These stochastic features are combined with the previous state of the LOB, L*^_^ (0), to produce 
the new state L’l (0) for a given set of parameters 0, given by 


l: {e) = G{LU{0),Xt) 


(15) 


G{-) is a transformation that maps the previous state of the LOB and the activity generated in 
the current step into a new step, much the same way as the matching engine updates the LOB after 
every event. As we model the activity in discrete intervals, however, the LOB is only updated at 
the end of every interval, and the incoming events (limit orders, market orders and cancellations) 
are processed in the order specified in Section |2.1[ Conditional then on a realization of these 
parameters 0, the trading activity in the LOB can be simulated for a single trading day, and the 
complete procedure is described in the algorithm set out in Panayi and Peters [2015 . 


2.2 Bayesian model formulation of the stochastic agent LOB model represen¬ 
tation 

In this section we consider the class of LOB stochastic models developed in the previous section and 
we detail a Bayesian model formulation under an ABC framework. Methods for Bayesian modelling 
in the presence of computationally intractable likelihood functions are of growing interest. These 
methods may arise either because the likelihood is truly intractable to evaluate point-wise, or in 
our case it may be that the likelihood is so complex in terms of model specification and costly 
to evaluate point-wise that one has to resort to alternative methods to perform estimation and 
inference. Termed likelihood-free samplers or Approximate Bayesian Computation (ABC) methods, 
simulation algorithms such as Sequential Monte Carlo Samplers have been adapted for this setting, 
see for instance Peters et al. 2012| . 

We start by recalling a few basics. Typically, Bayesian inference proceeds via the posterior 
distribution, generically denoted by Tr{0\y) oc /(y|0)7r(0), the updating of prior information vr(0) 
for a parameter 6 € E through the likelihood function f{y\6) after observing data y € y. Numerical 
algorithms, such as importance sampling, Markov chain Monte Carlo (MCMC) and sequential 
Monte Carlo (SMC), are commonly employed to draw samples from the posterior 7r(0|y). 
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Remark 2.3 (Note on Data vector) In the context of this chapter the data y is the entire order 
book structure for a given asset over a day as summarized by: 


Limit order submission order numbers , and order sizes ^ where 

_ / I 1 7 * 1 ,CLsS • 1 -\tLO .h.S 

Limit order numbers of cancellations , Nf'°' 


Numbers of market orders ^yMO,b ■yMO,a market order sizes 

MO^a ^MO^b ■ 1 iKjMO,a ■ ^jMO^b 


oit 


o 


The resulting observation vector yt, at time t, is then the concatenation of all these variables. 
These stochastic features are obtained at sampling rate t within the market hours of the trading 
day, typically say every 5-30 seconds for the 8.5 hour trading day, producing a total of between 
1000-6000 vector valued observations per day. 


Clearly, even evaluating a likelihood on this many records, even if it could be written down 
which in many LOB models built on queues like the one in this chapter will not be the case, would 
still be a challenging task. Generically we will denote below the collection of all observations y 
of the LOB for an asset on a given day, and by 9 the set of all parameters that are utilised to 
parameterize the LOB stochastic model. 

There is growing interest in posterior simulation in situations where the likelihood function is 
computationally intractable i.e. fiy\9) may not be numerically evaluated pointwise. As a result, 
sampling algorithms based on repeated likelihood evaluations require modification for this task. 

Collectively known as likelihood-free samplers (and also as approximate Bayesian computation) 
these methods have been developed across multiple disciplines. They employ generation of auxiliary 
datasets under the model as a means to circumvent (intractable) likelihood evaluation. 


2.2.1 Posterior models for computationally intractable likelihoods 

In essence, likelihood-free methods hrst reduce the observed data, y, to a low-dimensional vector 
of summary statistics ty = T{y) G T, where dim(0) < dim(ty) << dim(r/). Accordingly, the true 
posterior 7r(0|y) is replaced with a new posterior T:{9\ty). These are equivalent if ty is sufficient for 
6, and Tr{9\ty) ~ 7r(0|y) is an approximation if there is some loss of information through ty. The 
new target posterior, TT{9\ty), still assumed to be computationally intractable, is then embedded 
within an augmented model from which sampling is viable. Specifically the joint posterior of the 
model parameters 6, and auxiliary data t G T given observed data ty is 

7r(0, t\ty) oc Kh{ty - t)f{t\9)7r{e), (16) 

where t ~ fit\9) may be interpreted as the vector of summary statistics t = T(x) computed from 
a dataset simulated according to the model x ~ f{x\9). Assuming such simulation is possible, 
data-generation under the model, t ~ f{t\9), forms the basis of computation in the likelihood-free 
setting. The target marginal posterior TrMi9\ty) for the parameters 9, is then obtained as 

T^Mi9\ty) = CM j Khity - t)f{t\9)Tr{9)dt (17) 
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where (cm) ^ = Je It ^ hjty — t)f{t\6)Tr{d)dtd9 normalises (17) such that it is a density in 6 
Reeves and Pettitt ; |Wilkinson| [2013] ; ||Blum| 


(e-g. 


2010 ; Sisson et al. [2007 ; Fearnhead 


and Prangle 2012|). The function Kh{ty — t) is a standard kernel function, with scale parameter 


h > 0, which weights the intractable posterior with high density in regions t k, ty where auxiliary 
and observed datasets are similar. As such, 'KM{9\ty) ~ T:{0\ty) forms an approximation to the 
intractable posterior via © through standard smoothing arguments (e.g. |Blum| (2010| ). In the case 
as /i —>• 0, so that Kh{ty — t) becomes a point mass at the origin (i.e. ty = t) and is zero elsewhere, 
if ty is sufficient for 6 then the intractable posterior marginal 'KM{9\ty) = 7r(0|ty) = vr(0|y) is 
recovered exactly (although small h is usually impractical). Various choices of smoothing kernel K 
have been examined in the literature (e.g. Marjoram et al. ] [2003l ; jBeaumont et al.| |2002|; |Peters 
eTalKml] ). 

For our discussion on likelihood-free or ABC samplers, it is convenient to consider a generali¬ 


sation of the joint distribution (16) incorporating 5 > 1 auxiliary summary vectors 

where = (f^,...,f'^) and ~ f[t\6) are S independent datasets generated from the 

, con ditionally independent 
2012 and specify the kernel 


(intractable) model. As the auxiliary datasets are, by construction , 
given 0, we have f{k^\6) = nf=i /(^'^l^)- We follow 


Del Moral et al, 


K as Kh{ty,k'^) = S ^ Kh{ty — t^), which produces the joint posterior 


7rj{0,k^\ty) = cj 


S 


1 ^ K,{t, - (*) 

s=l 


ri/(**i9) 


Ls=l 


7r(0), 


(18) 


with cj > 0 the appropriate normalisation constant, where in (18) we extend the uniform kernel 
choice of K{ty — t^) by 


struc tion, itj{ 6, t 


1:5 


(c.f. Del Moral et al. 


Del Moral et al.| [2012| to the general case. It is easy to see that, by con- 
ty)d t^'^ = 7TM{9\ty) admits the distribution (17) as a marginal distribution 
The case iS = 1 with Trj{d,k^\ty) = Tr{0,t\ty) corresponds to the 


2012 


more usual joint posterior (16) in the likelihood-free setting 


There are two obvious approaches to posterior simulation from 7TM{9\ty) ~ 7r(0|fj;) as an ap¬ 
proximation to 7r(0|y). The first approach proceeds by sampling directly on the augmented model 
7rj(0, realising joint samples (0, € E x before a posteriori marginalisation over 

(i.e. by discarding the t^ realisations from the sampler output). In this approach, the summary 
quantities are treated as parameters in the augmented model. 

The second approach is to sample from TTM{9\ty) directly, a lower dimensional space, by ap¬ 
proximating the integral © via Monte Carlo integration in lieu of each posterior evaluation of 
T^Mi^lty). In this case 


TrM{0\ty) oc 7r(0) / Kh{ty - t)f{t\e)dt 


IT 


(19) 


S=1 


where ~ f{t\6). This expression, examined by various authors (e.g. Marjoram et al 


|2003 ; Reeves and Pettitt [2005| 


[2009 ; Toni et al. 2009| ; Peters et al. 2012[ ), 


Ratmann et al. 

requires multiple generated datasets t^,... ,t^, for each evaluation of the marginal posterior dis¬ 
tribution 7rM{9\ty). As with standard Monte Carlo approximations, Vaic[TrM{9\ty)] reduces as S 
increases, with lim^-^oo Var[7rM(^|ii/)] = 0. For the marginal posterior distribution, the quantities 
^1:5 ggj.yg only as a means to estimate vrjvf(0|fy), and do not otherwise enter the model explicitly. 
The number of samples S directly impacts on the variance of the estimation. 
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3 Estimation of Bayesian LOB stochastic agent model via Population- 
based samplers 


Population-based likelihood-free samplers were introduced to circumvent poor mixing in MCMC 


samplers (Sisson et al. [2007]; [Toni et al. 

[2009 ; 

Beaumont et al. 

[2009 ; 

Peters et al. 

|20T2 ; 

Del Moral et al. 

2012 ). These samplers propagate a population of particles, 0H1, ..., , with 


associated importance weights through a sequence of related densities 0i(0i),..., (/>t(0t), 

which dehnes a smooth transition from the distribution (/>i, from which direct sampling is available, 
to (pT the target distribution. 


For likelihood-free or ABC samplers, (pk is dehned by allowing Kh^ to place greater density 

on regions for which ty ^ t ask increases (that is, the bandwidth hn decreases with n). Hence, we de¬ 
note 7Tj,niO,t^'-^\ty) oc Kh^{ty,t^-^)f{t^'-^\6)Tr{e) and 7rM,n{^\ty) OC 7r(0) fj-s Kh„{ty,t^-^)f{t^-^\6)dt^-^ 
for n = 1,... ,T, under the joint and marginal posterior models respectively. 


3.1 Sequential Monte Carlo-based samplers 


SMC methods have emerged out of the helds of engineering, probability and statistics in recent 
years. Variants of the methods sometimes appear under the names of particle hltering or interacting 
particle systems (e.g. Ristic et al. 2004 Doucet et al. 2001 , Del Moral [2004]), and their theoretical 


properties have been extensively studied by |Crisan and Doucet 
[2004 , and Kunsch [2005 


2002j , Del Moral [2004 , Chopin 


The standard SMC algorithm involves Ending a numerical solution to a set of hltering recursions, 
such as hltering problems arising from nonlinear/non-Gaussian state space models. Under this 
framework, the SMC algorithm samples from an (often naturally occurring) sequence of distribu¬ 
tions TTn, indexed by n = 1,..., T. Each distribution is dehned on the support = E x E x ■ ■ ■ x E 
for some generic space denoted E. Recently, this class of algorithms was adapted to tackle the same 
class of problems typically addressed by MCMC methods where one has instead a sequence of dis¬ 
tributions {T^n}n>i sach dehned on hxed support E. NOTE: this is not a product space but a 
hxed space E. Del Moral et al. [2006 , Peters 2005 , Peters et al. 2012j and Targino et al. 2015j 
generalize the SMC algorithm to the case where the target distributions 7r„ are ah dehned on the 
same support E. This generalization, termed the SMC sampler, adapts the SMC algorithm to 
the more popular setting in which the state space E remains static, that is, the settings we have 
discussed earlier with regard to the MCMC algorithms. 

In short, the SMC sampler generates weighted samples (termed particles) from a sequence of 
distributions 7r„, for n = 1,..., T, where ttt may be of particular interest. We refer to ttt as the 
target distribution such as a posterior distribution for model parameters. Procedurally, particles 
obtained from an arbitrary initial distribution tti, with a set of corresponding initial weights, are 
sequentially propagated through each distribution vrt in the sequence via three processes, involving 
mutation (or move), correction (or importance weighting), and selection (or resampling). The hnal 
weighted particles at distribution nT are considered weighted samples from the target distribution 


TT. 


Hence, given a sequence of distributions {'?rn(d0)}„=i, the aim is to develop a large collection 

{ (*) (i) 1 ^ (i) 

Hn ', ©nI such that Hvi ' > 0 and 


^^1 = 1. These importance weights and samples, denoted by ©n^ 


} • 

J i=l 


are known as 
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particles (hence the name often given to such algorithms as particle filters or interacting particle 
systems). For such approaches to be sensible we would require that the empirical distributions 
constructed through these samples should converge asymptotically {N —?• oo) to the target distri¬ 
bution TTn for each time n. This means that for any 7r„ integrable function, denoted, for example, 
by 4>{0) : E ^ R' one would have the following convergence: 

N 

“ 4 -[</-(©)]. ( 20 ) 

^=1 

In the SMC Sampler algorithm, a particular variant of SMC algorithms, a modification of the 
SMC algorithm, is developed. Consider a generic sequence of distributions given by 7r„(0),n = 
1,..., T, with 6 € E, where the final distribution ttt is the distribution of interest. By introducing 
a sequence of backward kernels L^, a new distribution 

n —1 

...,9n)= TTniOn) Lk (0fc+i, 0fc) (21) 

k=l 


may be defined for the path of a particle (0i,..., On) G E"^ through the sequence vri,..., vTn. The 
only restriction on the backward kernels is that the correct marginal distributions 
f 7fn(0i ,..., 0n)d0i ,..., dOn-i = T^n{0n) are available. Within this framework, one may then work 
with the constructed sequence of distributions, under the standard SMC algorithm. 

In summary, the SMC Sampler algorithm involves three stages: 

1. Mutation, whereby the particles are moved from On-i to 9n via a mutation kernel Mn{9n-i, 9n)', 


2. Correction, where the particles are reweighted with respect to via the incremental impor¬ 
tance weight (Eq. 22); 


3. Selection, where according to some measure of particle diversity, commonly the effective 
sample size, the weighted particles may be resampled in order to reduce the variability of the 
importance weights. 


In more detail, suppose that at time n—1, the distribution ^n-i can be approximated empirically 
by ^n-i using iV-weighted particles. These particles are first propagated to the next distribution Hn 
using a mutation kernel Mn{9n-i, On), and then assigned new weights Wn = Wn-iWn {9i,... On), 
where Wn-i is the weight of a particle at time n — 1 and Wn is the incremental importance weight 
given by 


w, 


( 01 , ...,0n) = 


(01) • ■ • ) On) _ T^n {On) Ln—1 {On, 0n—l) 

7f„_i ( 01 , . . . , 0„_i) Mn (0„_l, 0n) vr„_i (0n-l) (0n-l, 0n) ' 


( 22 ) 


The resulting particles are now weighted samples from vr^. Consequently, from Eq. (22), under 


the SMC Sampler framework, one may work directly with the marginal distributions 'Kn{0n) such 
that Wn{0i ,..., On) = Wn{0n-i, On)- While the choice of the backward kernels Ln-i is essentially 
arbitrary, their specification can strongly affect the performance of the algorithm, as will be dis¬ 
cussed in the following subsections. The basic version of the SMC Sampler algorithm therefore 


proceeds explicitly as given in Algorithm 3.1 
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Remark 3.1 In all cases in which we utilize the incremental importanee sampling weight correc¬ 
tion, the arguments in the expressions only need to be known up to normalization. That is, it 
is perfectly acceptable to only be able to evaluate the sequenee of target distributions {vr„} up to 
normalization eonstant. This is true as long as the same normalization constant is present for all 
particles, sinee the renormalization step will correct for this lack of knowledge in the importance 
weighting. In praetiee, this is critical to the application of such methods. 


Sequential Monte Carlo Samplers 

1. Initialize the particle system; 

(a) Set n = 1; 

(b) For i = 1,..., draw initial particles 0^*^ 


•p{e)-, 


(c) Evaluate incremental importance weights |r(;i | using Equation (22) and normal¬ 

ize the weights to obtain 

Iterate the following steps through each distribution in sequence {'Xty^= 2 - 

2. Resampling 

(a) If the effective sampling size {ESS) = - ^ a < N^ff is less than a threshold 

then resample the particles via the empirical distribution of the weighted sample either 
by multinomial or stratified methods; see discussions on unbiased resampling schemes 
by Kiinsch 2005 and Del Moral [2004 . 


3. Mutation and correction 

(a) Set n = n -|- 1, if n = T -|- 1, then stop; 

(b) For i = 1,..., draw samples from mutation kernel 0n^ 


0 


»(*) 

’n-lI > 


Evaluate incremental importance weights |rcn^0n^^| using Equation (22) and 

|lFn*^| via 


normalize the weights to obtain 

IrW = ir(*) 


W 


(i) 




{Gn-l,&n) 


di) ..,(0 


(23) 


At this stage, for practitioners wishing to utilise such SMC methods, it is informative to under¬ 
stand better what properties are known about this class of estimation methods in terms of accuracy 
of such numerical approximations. We mention briefly some known results based on recent exam¬ 
ples of concentration inequalities for particle methods that are finite sample result (see discussion 
and references by Del Moral et al. 2013] ). 

The exponential concentration inequalities presented here are satisfied under some regularity 
conditions on the particle weights and the mutation kernel when defined on some general state 


space En\ see specific probabilistic details of these conditions by Del Moral 2004 
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Using the concentration analysis of mean field particle models, the following exponential es¬ 
timate can be obtained (see discussion by Del Moral 2004| ) and references therein. Note in the 
following when the N particle approximation to a distribution or density, such as vr, is used we will 
denote it by . 

Theorem 3.1 (Finite Sample Exponential Concentration Inequality) For any x > 0, n > 0, 

and any population size N >1, the probability of the event is 


Pr (kn (f) - T^n{F)\ < ^ (l + x + ^ + 


C2 

^/N 


x\ > 1 — e 


(24) 


where one defines the N particle sample estimator as follows: 

N 


vr. 


N 




){i) 

'n 


1=1 


and 


^ '^n (^n) dO^ 


(25) 


In the case of a stable SMC algorithm, that is, one that is insensitive to initial conditions, such 
as those we discussed earlier, the constants c and (ci,C 2 ) do not depend on the time parameter. 
One can also bound the difference between the particle estimate of the target distribution and the 
true distribution as follows. Consider that for any 6 = (0i)i<i<d and any (—oo,x] = 
cells in En = IR'^, we let 

Fn{x) = TTn (l(-oo,a;]) and F^(x) = VT^ (l(-oo,x]) ■ 

Using these definitions of the empirical particle constructed distribution function and the target 
distribution function at sequence number n in the sequence of distribution {vri, 7r2,..., vr^’}, we 
can state the following corollary for the distribution functions for sequence of densities vr^ given 
previously. 


Corollary 3.1 For any y > 0, n > 0, and any population size N > 1, the probability of the 
following event 


Vn ||F^ - F„|| < c ^d (y + 1) 


is greater than 1 — e ^. 

This concentration inequality ensures that the particle repartition function F^ converges to Ft, 
almost surely for the uniform norm. 

3.2 Sequential Monte Carlo samplers for intractable likelihood Bayesian models 

Formalizing this in the context of SMC Samplers for ABC posterior settings, the particle popu¬ 
lation 6n-i drawn from the distribution at time n — 1 is mutated to by the 

kernel M„(0„_i,0„). The weights for the mutated particles 6^ may be obtained as Wn{0n) = 
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Wn-i{0n-i)wn{'9n-i-,dn) where, for the marginal model sequence T^M,n{f^n\ty)-, the incremental 
weight is 


Wn ^n) — 


'^M,n—l{,^n—l\ty)^n {^n—lj ^n) '^M,n—l{^n—l\ty')Mji (^0n—l, On 


(26) 


where, following (19), and setting the kernel bandwidth to an ABC tolerance level e„ we obtain 

7r{6) ^ 


^M,n{^n\ty) ■= K,^{ty - f) 


S = 1 


which is proportional to an (unbiased) estimate of 'n'M,n{dn\ty) based on S Monte Carlo draws 
~ f{t\On)- Here L„_i (0„, 0„_i) is a reverse-time kernel describing the mutation of 
particles from 4>n{0n) at time n to (j)n-i{On-i) at time n — 1. As with the ABC-MCMC algorithm. 


the incremental weight (26) consists of the “biased” ratio ^M,n{dn\ty)/'^n-i{(^M,n-i\ty) for finite 
5 > 1 . 

If we now consider a sequential Monte Carlo sampler under the joint model vrj^n(^) t^'^\ty), with 
the natural mutation kernel factorisation 


Mn[iOn-l,tl;^l), (0n,tn^)] = Mn{0n-l,0n) f i'^n\h 


s=l 


(and similarly for L„_i), following the form of ( |26[ ), the incremental weight is exactly 

^r \(0 \ (0 -tn)T^{ 0 n)Ln-l{ 0 n, 0 n-l) 

[(t'n-l, tn-11 In iJ ~ 1 7^ /+ +s \ t a \l\/r (Q a\' 

S Es ity - (^n-l, 6»n) 


Hence, as the incremental weights (26, 27) are equivalent, they induce identical SMC algorithms 
for both marginal and joint models 'KM{0\ty) and 'Kj{6,t^'^\ty). As a result, while applications of 
the marginal sampler targeting 'KM{'9\y) are theoretically biased for finite S > 1, as before, they 
are in practice unbiased through association with the equivalent sampler on joint space targeting 
-Kj{e,t^-^\ty). 

We note that a theoretically unbiased sampler targeting 'KM{f^\ty), for all 5 > 1, can be obtained 
by careful choice of the kernel L„_i(0„, On-i)- For example, Peters [2005 , Peters et al. 2012j and 
Targino et al. [2015 all use the suboptimal approximate optimal kernel given by 




\ty)^n{^n—lj ^n) 

J" i^n—1 Ity'j^ni^n—lj ^n')dOn—l 


(28) 


from which the incremental weight (26) is approximated by 


Wn{,0n—\i0n) — ^M,n(^n|^j/)/ / '^M,n—lifin—l\ty)^n{,^n—li^n)dOn—l 


N 


^MA^n\ty)/ Y, Wn-l{eY)Mn{eY^ 0^). 


)(d 


(29) 


2=1 


Under this choice of backward kernel, the weight calculation is now unbiased for all 5 > 1, since 


the approximation TTM,n-i{(^\y) in the denominator of (26) is no longer needed. 
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3.2.1 Adaptive Schedules: Choice of sequence of ABC Bayesian distributions via 
annealed tolerance schednle 

In this section we consider how to take the ABC posterior distribution and construct a the sequence 
of distributions that are required for the SMC Sampler. That is, we address the question of how to 
develop an ABC specific sequence of target distributions. We have chosen to design this sequence 
by following what we call ABC reverse annealing. In particular, we construct a sequence of target 
posterior distributions {</>n }„>0 which are constructed based on strictly decreasing tolerance values, 
generically denoted by the sequence {en}n>o such that ei > e 2 > ... > > ... > ct- We obtain 

this sequence of ABC posterior distributions by considering the (j)n, which was defined with respect 
to the ABC likelihood involving a kernel. If we consider the kernel to have a decreasing bandwidth 
given by K^^{ty — t) then we will progressively place greater emphasis i.e. density on regions for 
which ty ^ t as n increases (that is, the bandwidth Cn decreases with n). Hence, we denote the 
two possible ABC constructions one may consider under the joint and marginal posterior models 
respectively, in the SMC Samplers procedure as follows; 

OC (fy, 

7TM,n{0\ty) (XTT{e) f k^^{ty,t^-^)f{t^-^\6)dt^-^ ^ ^ 

Jts 

Now the aspect of this procedure that makes it adaptive is the selection of the size of the dis¬ 
crepancy between and 7r„+i, for each nG{l,2...,T}as well as the final stopping point. In this 
paper we propose to perform adaption of the ABC target distribution sequence at every step. The 
aim is to progressively select a sequence of distributions, online such that the discrepancy between 
the next distribution and the previous, as controlled by the tolerance sequence, is controlled by 
the “fitness” or efficiency of the particle approximation of the previous target distribution in the 
sequence. A good approximation would indicate that one may take a larger step whilst a poorer 
approximation indicates that smaller steps should be taken. 

Formally, we perform the adaption such that a new tolerance Cn, at iteration n, is generated 
through a particle system based quantile matching procedure. The procedure adopted considers 
the new tolerance to be obtained as the solution for e in the following equation 

qn-i = eV2erf~^ [2F (q) - 1] , (31) 

where g is a user specified quantile level, F is the CDF of a normal distribution with mean 0 and 
standard deviation e and qn-i is the particle population quantile estimate obtained from the ABC 
posterior approximation after correction stage in the SMC Sampler ABC algorithm. In this way 
the tolerance schedule is continually adapting to the local particle approximation performance. In 
practice it is computationally efficient to employ the following adaptive schedule for the tolerance 
in the ABC posterior, where we ensure that the sequence of distributions is designed such that the 
new tolerance is calculated as a strictly decreasing schedule given by 

Cn = min((l - a)en-i,e*), (32) 

where a G (0,1). 
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3.2.2 Choice of Mutation Kernel 


There are many choices for mutation kernel that could be considered when designing an SMC 
Sampler algorithm. The choice of kernel is often critical to select in order for the algorithm to 
perform well. In this section, we first survey a few possible choices and then we present a specialized 
choice of mutation kernel we adopted from the genetic search literature [Li and Zhang[ 2009 which 
involves combination of mutation and cross-over operators for the particle mutation in the SMC 
Sampler. In order to utilise this class of mutation operator in SMC Sampler settings we had 
to formally write down not just the mutation and cross-over operators in a structural form, as 
typically specified in the NGSAII class of genetic optimization algorithms, but to also define their 
distributional form. We provide these at the end of the section. 

Some examples of possible choices of the mutation kernel are given as follows; 

1. Independent kernels. In this setting, one would select a mutation kernel given for all 
n E {1, 2,..., T} by {On-i, On) = Mn {On)] 


2. Local Random Walks. In this setting, the kernel would be selected for all 
n E {1,2, ...,T} to be of the form Mn{0n-i,0n), where the mutation from On-i to On 
follows a local Random Walk based around, say, a Gaussian smoothing kernel as given by 
Givens and Raftery [1996 ; 


3. Markov chain Monte Carlo Kernels. In this setting, the kernel would be selected for all 
n E {1, 2,..., T} to be an MGMG kernel of invariant distribution vr^. As noted by Del Moral 
et al. [2006] and Peters [2005 , this option is suitable if the Markov chain kernel is mixing 


rapidly or if the sequence of distributions is such that 7r„_i is close to 7r„, which is often 
the case by design. Then the use of an MGMG kernel would result in running for each 
stage, N inhomogeneous Markov chains. Then one must correct for the fact that one is not 
targeting the correct distribution under these Markov chains, which is achieved using IS: 

{0) and running L iterations of the Markov chain for each particle, 

^n-1 

where each of the N chains will target Oz^i > which is not in general 

TTn, then with an IS correction, such an approach is accurate and unbiased (i.e., targets the 
distribution of interest at time n given by vr^; 

4. Gibbs Sampler kernels. If the sequence of target distributions {vr„}„>o is such that its 
support is multivariate, then it may also be possible to sample from the full conditional 
distributions in the sequence of distributions. This approach allows one to undertake a Gibbs 
step, which would involve a kernel for update of the k-th element given in the form 


Mn {On—1^ dOn) — _j, {dOn^—k) '^n {0n,k\0n,—k) 


(33) 


with On-k = (^n, 1 ) 2 ) •••; ^n,fc-i) fc-i-i) where there are J parameters in the 

OpRisk model target posterior. If the full conditionals are not available, one could approxi¬ 
mate them accurately at each stage and then correct for the approximation error through IS; 


5. Mixture kernels. It is always possible to consider a mixture kernel choice given by 


M 


Mn {On—li On) — ^ ^ Oin,m {On—l) Mn^m {On—li On 


(34) 


m=l 
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with an,m [Qn-i) > 0 and Ylm=i ^n,m (^n-i) = 1- One special case of this type of kernel 
would be an independent kernel constructed by a kernel density estimate of Mn,m {dn-i, ^n) = 

Mn {On-1, On) for all m and an,m (^n-i) = bbnii with M = N] 

6. Partial Rejection Control kernels. In this case, one aims to construct a mutation kernel 
in the SMC Sampler that guarantees all sampled particles have importance weights with 
a “fitness” exceeding a user-specified threshold at each time n, denoted by Cn such that 
> Cn, Vi G {1, 2,..., A^}. To achieve this, one modifies any of the earlier mutation 
kernels to take the form given by 


KK-i,0n) =r(c„,0„_i 


(*) \-i 


min ^ 1, 


Wr, 


nf'O a 




(35) 


The quantity r{cn,0^^^) denotes the normalizing constant for particle 0)^^^, given by 


b) 


r{cn,0^^l^) = / min<( 1,W, 


(d 


Wn 


n—1 




(36) 


Note that 0 < r{cn,0n-i) < 1 if (w.l.o.g.) the mutation kernel Mn is normalized, so that 
f Mn(On-i, On)dOn = 1, and if the PRC threshold 0 < c„ < oo is finite. The sequence of 
PRC thresholds is then user-specified to ensure a certain particle “fitness” at each stage of 
the SMC Sampler. We will detail more explicitly this example in a future section. 


7. Genetic Mutation and Cross-Over operators. In this class of mutation kernel in the 
SMC Sampler we consider the class of genetic algorithm type mutations. In particular, we 
describe the class of MOEA mutation and crossover operators widely used in stochastic search 
algorithms, introduced in the method of Deb et al. [2002 . This class of mutation kernel is the 
most widely used operator in multi-objective optimisation and we demonstrate its adaption 
to the SMC Sampler framework. 

A disadvantage of the NSGA-II operators is that they are only able to mutate binary, integer, 
or real encodings of the output parameter vectors, whereas the stochastic process for the limit 
order submission activity by liquidity providers requires the specification of a positive definite 
and symmetric covariance matrix for the generation of intensities from a multivariate skew-t 
distribution. The positive definiteness and symmetry constraints of the covariance matrix 
will not be preserved if one simply employs the evolutionary operators above to produce 
new sets of covariance matrix candidate solutions. For this reason, Panayi and Peters 2015| 
introduced a new covariance mutation operator which generates new candidate covariance 
matrices which remains in the manifold of positive definite matrices. 

Simulated Binary Crossover (SBX): From two previous particles a new 

(i) 

solution is formed, where the /c-th elements is crossed as follows: 




(i,k) 




(37) 
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Here, /3 is a random sample from a distribution with density 


- ^ f (au) ’>c+i, if ^ 
1 ( 2 :^)’^’ otherwise. 


where u ~ ?7(0,1) and a = 2 — (3 with 

2 


/3 — 1 + 


fl(i.fc) _ oihk) 
“n-l “n-1 


mm 


>n-l - > ^ku - ^n-l 


(38) 


This would produce a mutation kernel for this type of move at SMC Sampler iteration n for 
the A:-th element of the i-th particle vector which would be updated according to a density 
given by 
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We use the crossover operator with probability pc = 0.7 and a distribution index r/c = 5. 
Every element k of the i-th particle vector is crossed with probability 0.5. 

Polynomial mutation: The mutation operator perturbs elements of the solution, according 
to the distance from the boundaries. 




where we have for 5 


- f [27 -|- (1 — 27)(1 — (5)’^™'*“^] _ 1 if 7 < 0.5 

[ 2 (l- 7 )-h 2 ( 7 - 0 . 5 )(l-( 5 )’?™+^]^ if 7 >0.5. 


with 
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where, 7 t/( 0 ,l). 

This would produce a mutation kernel for this type of move at SMC Sampler iteration n for 
the k-th. element of the i-th particle vector which would be updated according to a density 
given by 
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The distribution index rjrn = 10. The polynomial mutation operator is used with probability 

Pm = 0 . 2 . 

Covariance mutation operator: In the t-th generation of the MOEA, we generate | , i 
1.. .N from a mixture distribution defined as follows; 

M„(S«) = (1 - Wi)IW{^n,Pl) + WiIW{^,P2) 


where XW denotes the Inverse Wishart distribution, pi,P 2 are degrees of freedom param¬ 
eters with p 2 < Pi, and where wi is small so that sampling from the second distribution 
happens infrequently. Here T denotes an uninformative positive definite matrix, with the 
effect that sampling from the second distribution leads to moves away from the local region 
being explored, 'ht is also a positive definite matrix, fitted based on moment matching to 
the sample mean of the successfully proposed candidate solutions in the previous stage of the 
Multi-Objective optimisation as follows: 


= 


_^_y 


N 


w 


s;^N 


V — 

(i) n 

i=l 


ii) 

where r*' is the non-domination rank of the i-th solution in the s-th generation, and with 
tc < 1 is an exponential weighting factor. 


4 Application to equities LOB data: data description 

The data employed in this study constitutes the intra-day trading activity on the European multi¬ 
lateral trading facility (MTF) Chi-X Europe between January and April 2012. Chi-X Europe oper¬ 
ated as an individual entity from 2007, before being purchased by BATS Europe at the end of the 
trading period under consideration. We note that Chi-X Europe is a secondary exchange, i.e. the 
securities that are traded on the exchange are listed and primarily traded on national/supranational 
exchanges, including the London Stock Exchange, Euronext, Deutsche Boerse, and the SIX Swiss 
Exchange, amongst others. However, it maintains a significant proportion of the daily trading 
activity in each of these markets, between 20% and 35% in most case^ 

The complete dataset covers over 1300 assets, primarily stocks, but also including exchange- 
traded funds (ETFs) and American depositary receipts (ADRs). For the purposes of this study, we 
select one of the most commonly traded stocks in the French CAC 40 Index, namely BNP Paribas 
SA. Figure shows the evolution of the LOB on a typical day for this asset based on real market 
observation data from the LOB. We also present a heatmap of the inside spread St = Pt'^ — Pt’^ 
over the 2 month period February to March 2012. The inside spread is the most common measure 
of ‘liquidity’, i.e. the relative ease with which one can buy or sell a financial asset. 

Chi-X Europe operates both a visible and a hidden order book, and traders have the option 
to route orders to the hidden book if they meet certain conditions relating to order type and size. 
The dataset consists of only data in the visible book, after it has been processed by the exchange’s 
matching engine. That is, while the exchange allows for a range of order types with time-in-force 
modifiers, the processed data consists of the timestamps and order sizes of limit order submissions, 

'http://www.liquidmetrix.com/liquidmetrix/battlemap 
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Figure 2: (Left); A representation of real market data intra-day LOB states obtained from the 
trading activity for asset BNP Paribas SA on the 5th of March 2012. The shading of each box 
indicates the volume available at that price, which is volume available to buy for blue-bordered 
boxes and volume available to sell for red-bordered boxes. (Right): A heatmap of the intra-day 
spread for the period February to March 2012 for asset BNP Paribas SA. 


executions and cancellations. However, this data is sufficient to construct a much more detailed 
picture of the state of the LOB than is typically available in previous studies (which only consider 
aggregate volume in either the first level or the first 5 levels), as we can disaggregate the volumes 
per level up to any depth in the LOB. 

The raw, unevenly spaced data is thus used to construct the state of the LOB at each event 
timestamp (these are accurate up to millisecond precision). Because of our interest in fitting the 
auxiliary models describing price and volume dynamics (these are outlined in Section]^, however, 
we subsample the process at regular 10 second intervals, in order to then extract the price and 
volume variables of interest. Thus, from an irregularly spaced process typically containing between 
50,000 and 500,000 events every day, we extract a regular timeseries of the auxiliary model variables 
for the purposes of our estimation. 


5 Results 


The results presented in this section may be compared to those obtained from indirect inference 
procedures as reported in Panayi and Peters [2015 . To achieve comparison we have also provide 


the results for what they termed the benchmark ‘reference’ model, which makes a series of as¬ 
sumptions in order to simplify estimation and model structure. This basic reference model has the 
following parameter vector | ,'yo, as well as the covariance matrix S to 


be estimated, see details in Panayi and Peters [2015 . The results will be presented in terms of the 
ABC marginal posterior distributions of the individual parameters of the LOB simulation and in 
terms of the resulting stochastic agent based LOB model to reasonably produce realistic features 
of the simulated LOB intra-daily. 

we discussed the reduction of the 


In our introduction to likelihood-free methods in Section 2.2.1 
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observed data y to a low-dimensional vector of summary statistics ty. We are interested specifically 
in two of the most commonly studied LOB characteristics, which correspond to the volatility in 
the log returns obtained from the price process dynamic (as obtained from half the inside spread) 
and the evolution of the volume resting on the LOB (as measured by the instantaneous aggregate 
total volume on the bid and ask at levels 1 to 5). The summaries we adopt at this stage are 
less-standard in ABC applications since they employ a functional (i.e. regression model based) 
summary of features of observable LOB process. In this case the summary information becomes 
the model characterization (dimensional reduction) captured by the estimated model parameters 
fit to the real LOB data and the simulated LOB data for price or volume dynamic. Specifically we 
have; 

Auxiliary Model 1 - Price features: If we denote the mid-price as ^then the 

log return is defined as 


rt = In 


^mid 

Pt 


fy^TTlid 

Pt-At 


where is a suitable interval, in our case 1 minute. We fit a GARCH(1,1) model for this aspect 
of the data parameterized by (3i. 

Auxiliary Model 2 - Volume features: We fit an MA(1) model to the detrended total volume 
(i.e. an ARIMA(0,1,1) model) in the first 5 levels on both the bid and ask side parameterized by 
(32, in order to capture the time series structure of the LOB volumes. 

The auxiliary models are ht to both the real and simulated data, and for the distance we 
estimate the Euclidean distances between the auxiliary parameter vectors 


V2 = v(p2{y)My*m). 


5.1 Estimation algorithm configuration 

To perform the estimation there are also a number of inputs to the SMC Sampler ABC algorithm 
that we specify, including the number of particles, the tolerance schedule forced decrement amount 
and the total number of iterations over which to run the estimation. Specifically, we have for our 
estimation procedure: 

• The estimation procedure was run for 20 iterations.; 


The tolerance schedule employed was the forced decrement schedules specified in Section 


3.2.1, with a decrement parameter a = 0.1.; 


• We obtain results using 50, 100 and 200 particles per iteration.; 


• We also tested the quality of the results for a series of quantile levels for the tolerance, i.e. 
?o. 5 ) ®.75 and g'o.g- 

Carrying out the estimation procedure for each configuration above indicated that the best 
results (in terms of the lowest values of Di,D 2 ) were obtained for a quantile level go .9 for the 
tolerance, and 200 particles. We repeated the estimation procedure 20 times with this configuration 
and configuration above and Figure shows the evolution of the tolerance in the ABC posterior in 
the case of the forced tolerance schedule, when the estimation is run for T = 20 iterations. 
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We note that the mutation operator for the covariance matrix, specified in Section 3.2.2, which 
was composed of both an exploration and a mutation component, could lead to particle degeneracy 
in higher dimensions. Consequently, in practice it can be computationally more efficient to simplify 
the mutation kernel for the covariance matrix to a static mutation kernel which would eliminate 
the prior weighting in the numerator and denominator of each incremental particle weight. When 
this was performed it produces particle systems which less degeneracy issues in higher dimensions. 

Secondly, due to the nature of the crossover operator, it is possible for a particle to cross with 
an identical particle, for example if the two particles were produced in the resampling step of the 
previous iteration. In our estimation we explicitly exclude this possibility and where a particle 
is chosen to cross with an identical particle, it is instead mutated using the operator specified in 
Section 13.2.21 
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o 

'w 

Q. 

CD 



Iterations 


Figure 3: The adaptively estimated tolerance schedule obtained from multiple SMC Sampler-ABC 


runs on real data for BNP Paribas on 05/03/2012 specified in Section 3.2.1 


5.2 Final particle fitness and distributions of parameters 

Having run the SMC Sampler-ABC algorithm on the BNP Paribas LOB data for 05/03/2012 we 
obtained estimates of the posterior for the agent based LOB simulation model. The first set of 
results shows the accuracy of the LOB model to replicate features of the real LOB stochastic 
process relating to price and volume dynamics. This is clearly illustrated in Figure]^ in terms of 
the values of the objective functions 'Di,'D 2 for each of the particles at the final iteration stage 
of the SMC Sampler-ABC algorithm. This is the standard way in which results for optimisation 
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Figure 4: The realized objective function (distance metrics Di and P 2 ) values from each partcle in 
the SMC Sampler-ABC algoirthm at the hnal iteration for independent trials on the real data for 
BNP Paribas on 05/03/2012. The x-axis is the GARCH(1,1) model parameter distance discrep¬ 
ancies for the intra-day volatility dynamic of the price process. The y-axis is the ARIMA(0,1,1) 
model parameter distance discrepancies for the intra-day volume process dynamics. 


using multi-objective evolutionary algorithms (MOEAs) are presented (see discussion in Panayi 


and Peters 2015| ), in order to show the Pareto optimal front that is obtained in that setting, see 


the discussion in the Section 15.31 

We also present realisations of the LOB intra-day evolution for both the particle with the highest 
weight and the weighted mean of the particles in Figure We note that there are differences in 
the intra-day dynamics of the simulated hnancial market resulting from different repetitions of the 
estimation procedure. However, we note that for a subset of particles we can recover price and 
volume dynamics that are similar to those observed in the real market (an example of which we 
had seen in Figure]^. 

To complete the analysis we also illustrate the median of the resulting marginal posterior dis¬ 
tributions for the model parameters obtained from 20 independent runs of the SMC Sampler-ABC 
algorithm for the BNP Paribas data on 05/03/2012. These results are presented in Figure]^ 


5.3 Results Comparison to MOEA-II procedure 


The method introduced in [Panayi and Peters [2015| is a combination of simulation-based indirect 
inference (II) and multi-objective optimisation, denoted the Multi-objective-II estimation frame¬ 
work. In common with ABC, II is used when one cannot write down the likelihood of the data 
generating model in closed form, but realisations are easily obtained via simulation given model 
parameters 6. II introduces a new, ‘auxiliary’ model (with parameter vector (3), which is fit to a 
transform of both the real and simulated data {y and y*{0), respectively) and the objective is to 
find the model parameter vector 6 which minimises some distance metric D{l3(y), f3{y*(9)). 

The multi-objective extension to the standard II procedure pertains to the objective function 
D{P{y), f3{y*{6)). Where standard II procedures consider a scalar output of the objective function, 
the Multi-objective-II method considers a vector-valued output, where each element of the vector 
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Figure 5: Representations of simulated intra-day LOB states obtained from using the (Top): MAP 
particle from a single estimation procedure and (Bottom): MMSE particle estimates. 


pertains to a different feature of the LOB stochastic process. In this framework, the search is 
then for non-dominated parameter vectors, i.e. such that there is no parameter vector in the 
search space that can unilaterally improve a single criterion (objective function element) without 
worsening another criterion. The procedure uses the same mutation and crossover kernels outlined 


(ms . 


in Section 3.2.2 and outputs a set of Pareto optimal solutions, see details in Panayi and Peters 


Where the SMC-ABC method returns a family of particles and associated weights, the MOEA- 
II procedure returns a family of particles and their non-domination rank. Our comparison is then 
between the highest weighted particles returned from the former procedure, and the non-dominated 
particles returned from the latter. The results have been found to be comparable between the two 
methods, both in terms of achieving similar objective function values and in terms of producing 
simulations which resemble real financial markets. That is, while not all highest weight/non- 
dominated particles will give rise to realistic financial market simulations, there is a subset that 
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Figure 6: Representations of simulated intra-day LOB states obtained from using the (Top): MAP 
particle from a single estimation procedure and (Bottom): MMSE particle estimates. 


do. 

While we have tried to provide a fair comparison between the two methods by utilising the 
same mutation and crossover operators, we should highlight some differences between the MOEA- 
II procedure and the SMC-ABC procedure presented in this paper. Firstly, the MOEA-II procedure 
did not suffer from particle degeneracy when using the adaptive mutation kernel for the covariance 
mutation operator, and thus the operator in Section 3.2.2| was utilised as described. Secondly, where 
the probability of crossover between particles in the MOEA-II procedure was set at the default value 
of 0.7 in every iteration, this was found to cause additional particle degeneracy issues and thus the 
probability was reduced to 0.05 (with the additional exclusion of crossing with identical particles 
described at the front of this section). 
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Figure 7: Heatmaps of the intra-day value of the spread for (Left): The MAP particle from the 
estimation procedure and (Right): MMSE particle estimates. 


6 Conclusion 

This chapter has proposed a stochastic agent based liquidity supply and demand based simulation 
based model to characterize the LOB for an asset traded on an electronic exchange. The calibration 
of this model to real market LOB data has been performed via a posterior inference procedure that 
adopted an ABC structure due to the complexity of writing down the resulting likelihood for the 
LOB agent simulation model. The estimation of the posterior distribution was then shown how to 
be performed via an adaptive SMC Sampler-ABC algorithm. The results were tested on real data 
and compared to an indirect inference procedure with multi-objective optimization features. 
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